The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

41-60hit(149hit)

  • A Bus Crowdedness Sensing System Using Deep-Learning Based Object Detection

    Wenhao HUANG  Akira TSUGE  Yin CHEN  Tadashi OKOSHI  Jin NAKAZAWA  

     
    PAPER

      Pubricized:
    2022/06/23
      Vol:
    E105-D No:10
      Page(s):
    1712-1720

    Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.

  • Single Suction Grasp Detection for Symmetric Objects Using Shallow Networks Trained with Synthetic Data

    Suraj Prakash PATTAR  Tsubasa HIRAKAWA  Takayoshi YAMASHITA  Tetsuya SAWANOBORI  Hironobu FUJIYOSHI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/06/21
      Vol:
    E105-D No:9
      Page(s):
    1600-1609

    Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single suction gripper for picking up objects. The proposed method is based on a shallow network to enable lower training costs and efficient inference on limited resources. Costs are further reduced by collecting data in a custom-built synthetic environment. For evaluating the proposed method, we developed a system that models a commercial kitchen for a dishwasher robot to manipulate symmetric objects. We tested our method against a model-fitting method and an algorithm-based method in our developed commercial kitchen environment and found that a shallow network trained with only the synthetic data achieves high accuracy. We also demonstrate the practicality of using a shallow network in sequence with an object detector for ease of training, prediction speed, low computation cost, and easier debugging.

  • MSFF: A Multi-Scale Feature Fusion Network for Surface Defect Detection of Aluminum Profiles

    Lianshan SUN  Jingxue WEI  Hanchao DU  Yongbin ZHANG  Lifeng HE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/30
      Vol:
    E105-D No:9
      Page(s):
    1652-1655

    This paper presents an improved YOLOv3 network, named MSFF-YOLOv3, for precisely detecting variable surface defects of aluminum profiles in practice. First, we introduce a larger prediction scale to provide detailed information for small defect detection; second, we design an efficient attention-guided block to extract more features of defects with less overhead; third, we design a bottom-up pyramid and integrate it with the existing feature pyramid network to construct a twin-tower structure to improve the circulation and fusion of features of different layers. In addition, we employ the K-median algorithm for anchor clustering to speed up the network reasoning. Experimental results showed that the mean average precision of the proposed network MSFF-YOLOv3 is higher than all conventional networks for surface defect detection of aluminum profiles. Moreover, the number of frames processed per second for our proposed MSFF-YOLOv3 could meet real-time requirements.

  • BFF R-CNN: Balanced Feature Fusion for Object Detection

    Hongzhe LIU  Ningwei WANG  Xuewei LI  Cheng XU  Yaze LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/17
      Vol:
    E105-D No:8
      Page(s):
    1472-1480

    In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

  • Loan Default Prediction with Deep Learning and Muddling Label Regularization

    Weiwei JIANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/04/04
      Vol:
    E105-D No:7
      Page(s):
    1340-1342

    Loan default prediction has been a significant problem in the financial domain because overdue loans may incur significant losses. Machine learning methods have been introduced to solve this problem, but there are still many challenges including feature multicollinearity, imbalanced labels, and small data sample problems. To replicate the success of deep learning in many areas, an effective regularization technique named muddling label regularization is introduced in this letter, and an ensemble of feed-forward neural networks is proposed, which outperforms machine learning and deep learning baselines in a real-world dataset.

  • A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU Open Access

    Kentaro KAWAKAMI  Kouji KURIHARA  Masafumi YAMAZAKI  Takumi HONDA  Naoto FUKUMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    222-231

    To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

  • Facial Recognition of Dairy Cattle Based on Improved Convolutional Neural Network

    Zhi WENG  Longzhen FAN  Yong ZHANG  Zhiqiang ZHENG  Caili GONG  Zhongyue WEI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/03/02
      Vol:
    E105-D No:6
      Page(s):
    1234-1238

    As the basis of fine breeding management and animal husbandry insurance, individual recognition of dairy cattle is an important issue in the animal husbandry management field. Due to the limitations of the traditional method of cow identification, such as being easy to drop and falsify, it can no longer meet the needs of modern intelligent pasture management. In recent years, with the rise of computer vision technology, deep learning has developed rapidly in the field of face recognition. The recognition accuracy has surpassed the level of human face recognition and has been widely used in the production environment. However, research on the facial recognition of large livestock, such as dairy cattle, needs to be developed and improved. According to the idea of a residual network, an improved convolutional neural network (Res_5_2Net) method for individual dairy cow recognition is proposed based on dairy cow facial images in this letter. The recognition accuracy on our self-built cow face database (3012 training sets, 1536 test sets) can reach 94.53%. The experimental results show that the efficiency of identification of dairy cows is effectively improved.

  • Localization of Pointed-At Word in Printed Documents via a Single Neural Network

    Rubin ZHAO  Xiaolong ZHENG  Zhihua YING  Lingyan FAN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/01/26
      Vol:
    E105-D No:5
      Page(s):
    1075-1084

    Most existing object detection methods and text detection methods are mainly designed to detect either text or objects. In some scenarios where the task is to find the target word pointed-at by an object, results of existing methods are far from satisfying. However, such scenarios happen often in human-computer interaction, when the computer needs to figure out which word the user is pointing at. Comparing with object detection, pointed-at word localization (PAWL) requires higher accuracy, especially in dense text scenarios. Moreover, in printed document, characters are much smaller than those in scene text detection datasets such as ICDAR-2013, ICDAR-2015 and ICPR-2018 etc. To address these problems, the authors propose a novel target word localization network (TWLN) to detect the pointed-at word in printed documents. In this work, a single deep neural network is trained to extract the features of markers and text sequentially. For each image, the location of the marker is predicted firstly, according to the predicted location, a smaller image is cropped from the original image and put into the same network, then the location of pointed-at word is predicted. To train and test the networks, an efficient approach is proposed to generate the dataset from PDF format documents by inserting markers pointing at the words in the documents, which avoids laborious labeling work. Experiments on the proposed dataset demonstrate that TWLN outperforms the compared object detection method and optical character recognition method on every category of targets, especially when the target is a single character that only occupies several pixels in the image. TWLN is also tested with real photographs, and the accuracy shows no significant differences, which proves the validity of the generating method to construct the dataset.

  • Accuracy Improvement in DOA Estimation with Deep Learning Open Access

    Yuya KASE  Toshihiko NISHIMURA  Takeo OHGANE  Yasutaka OGAWA  Takanori SATO  Yoshihisa KISHIYAMA  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2021/12/01
      Vol:
    E105-B No:5
      Page(s):
    588-599

    Direction of arrival (DOA) estimation of wireless signals is demanded in many applications. In addition to classical methods such as MUSIC and ESPRIT, non-linear algorithms such as compressed sensing have become common subjects of study recently. Deep learning or machine learning is also known as a non-linear algorithm and has been applied in various fields. Generally, DOA estimation using deep learning is classified as on-grid estimation. A major problem of on-grid estimation is that the accuracy may be degraded when the DOA is near the boundary. To reduce such estimation errors, we propose a method of combining two DNNs whose grids are offset by one half of the grid size. Simulation results show that our proposal outperforms MUSIC which is a typical off-grid estimation method. Furthermore, it is shown that the DNN specially trained for a close DOA case achieves very high accuracy for that case compared with MUSIC.

  • Anomaly Detection Using Spatio-Temporal Context Learned by Video Clip Sorting

    Wen SHAO  Rei KAWAKAMI  Takeshi NAEMURA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/02/08
      Vol:
    E105-D No:5
      Page(s):
    1094-1102

    Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.

  • Sea Clutter Image Segmentation Method of High Frequency Surface Wave Radar Based on the Improved Deeplab Network

    Haotian CHEN  Sukhoon LEE  Di YAO  Dongwon JEONG  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2021/10/12
      Vol:
    E105-A No:4
      Page(s):
    730-733

    High Frequency Surface Wave Radar (HFSWR) can achieve over-the-horizon detection, which can effectively detect and track the ships and ultra-low altitude aircrafts, as well as the acquisition of sea state information such as icebergs and ocean currents and so on. However, HFSWR is seriously affected by the clutters, especially sea clutter and ionospheric clutter. In this paper, we propose a deep learning image semantic segmentation method based on optimized Deeplabv3+ network to achieve the automatic detection of sea clutter and ionospheric clutter using the measured R-D spectrum images of HFSWR during the typhoon as experimental data, which avoids the disadvantage of traditional detection methods that require a large amount of a priori knowledge and provides a basis for subsequent the clutter suppression or the clutter characteristics research.

  • The Ratio of the Desired Parameters of Deep Neural Networks

    Yasushi ESAKI  Yuta NAKAHARA  Toshiyasu MATSUSHIMA  

     
    LETTER-Neural Networks and Bioengineering

      Pubricized:
    2021/10/08
      Vol:
    E105-A No:3
      Page(s):
    433-435

    There have been some researchers that investigate the accuracy of the approximation to a function that shows a generating pattern of data by a deep neural network. However, they have confirmed only whether at least one function close to the function showing a generating pattern exists in function classes of deep neural networks whose parameter values are changing. Therefore, we propose a new criterion to infer the approximation accuracy. Our new criterion shows the existence ratio of functions close to the function showing a generating pattern in the function classes. Moreover, we show a deep neural network with a larger number of layers approximates the function showing a generating pattern more accurately than one with a smaller number of layers under the proposed criterion, with numerical simulations.

  • Consistency Regularization on Clean Samples for Learning with Noisy Labels

    Yuichiro NOMURA  Takio KURITA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/10/28
      Vol:
    E105-D No:2
      Page(s):
    387-395

    In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.

  • Learning Pyramidal Feature Hierarchy for 3D Reconstruction

    Fairuz Safwan MAHAD  Masakazu IWAMURA  Koichi KISE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/11/16
      Vol:
    E105-D No:2
      Page(s):
    446-449

    Neural network-based three-dimensional (3D) reconstruction methods have produced promising results. However, they do not pay particular attention to reconstructing detailed parts of objects. This occurs because the network is not designed to capture the fine details of objects. In this paper, we propose a network designed to capture both the coarse and fine details of objects to improve the reconstruction of the fine parts of objects.

  • An FPGA-Based Optimizer Design for Distributed Deep Learning with Multiple GPUs

    Tomoya ITSUBO  Michihiro KOIBUCHI  Hideharu AMANO  Hiroki MATSUTANI  

     
    PAPER

      Pubricized:
    2021/07/01
      Vol:
    E104-D No:12
      Page(s):
    2057-2067

    Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.

  • An Improved U-Net Architecture for Image Dehazing

    Wenyi GE  Yi LIN  Zhitao WANG  Guigui WANG  Shihan TAN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/09/14
      Vol:
    E104-D No:12
      Page(s):
    2218-2225

    In this paper, we present a simple yet powerful deep neural network for natural image dehazing. The proposed method is designed based on U-Net architecture and we made some design changes to make it better. We first use Group Normalization to replace Batch Normalization to solve the problem of insufficient batch size due to hardware limitations. Second, we introduce FReLU activation into the U-Net block, which can achieve capturing complicated visual layouts with regular convolutions. Experimental results on public benchmarks demonstrate the effectiveness of the modified components. On the SOTS Indoor and Outdoor datasets, it obtains PSNR of 32.23 and 31.64 respectively, which are comparable performances with state-of-the-art methods. The code is publicly available online soon.

  • Smaller Residual Network for Single Image Depth Estimation

    Andi HENDRA  Yasushi KANAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/17
      Vol:
    E104-D No:11
      Page(s):
    1992-2001

    We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.

  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • Gradient Corrected Approximation for Binary Neural Networks

    Song CHENG  Zixuan LI  Yongsen WANG  Wanbing ZOU  Yumei ZHOU  Delong SHANG  Shushan QIAO  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/05
      Vol:
    E104-D No:10
      Page(s):
    1784-1788

    Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

  • Image Emotion Recognition Using Visual and Semantic Features Reflecting Emotional and Similar Objects

    Takahisa YAMAMOTO  Shiki TAKEUCHI  Atsushi NAKAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/06/24
      Vol:
    E104-D No:10
      Page(s):
    1691-1701

    Visual sentiment analysis has a lot of applications, including image captioning, opinion mining, and advertisement; however, it is still a difficult problem and existing algorithms cannot produce satisfactory results. One of the difficulties in classifying images into emotions is that visual sentiments are evoked by different types of information - visual and semantic information where visual information includes colors or textures, and semantic information includes types of objects evoking emotions and/or their combinations. In contrast to the existing methods that use only visual information, this paper shows a novel algorithm for image emotion recognition that uses both information simultaneously. For semantic features, we introduce an object vector and a word vector. The object vector is created by an object detection method and reflects existing objects in an image. The word vector is created by transforming the names of detected objects through a word embedding model. This vector will be similar among objects that are semantically similar. These semantic features and a visual feature made by a fine-tuned convolutional neural network (CNN) are concatenated. We perform the classification by the concatenated feature vector. Extensive evaluation experiments using emotional image datasets show that our method achieves the best accuracy except for one dataset against other existing methods. The improvement in accuracy of our method from existing methods is 4.54% at the highest.

41-60hit(149hit)